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Abstract 

Covariance  matrices  provide  an  easy  platform  for  fusing  multiple  features  com¬ 
pactly  and  as  a  result  have  found  immense  success  in  several  computer  vision  ap¬ 
plications,  including  activity  recognition,  visual  surveillance,  and  diffusion  tensor 
imaging.  An  important  task  in  all  of  these  applications  is  to  compute  the  distance 
between  covariance  matrices  using  a  (dis)similarity  function,  for  which  the  natural 
choice  is  the  Riemannian  metric  corresponding  to  the  manifold  inhabited  by  these 
matrices.  As  this  Riemannian  manifold  is  not  flat,  the  dissimilarities  should  take 
into  account  the  curvature  of  the  manifold.  As  a  result  such  distance  computations 
tend  to  slow  down,  especially  when  the  matrix  dimensions  are  large  or  gradients 
are  required.  Further,  suitability  of  the  metric  to  enable  efficient  nearest  neighbor 
retrieval  is  an  important  requirement  in  the  contemporary  times  of  big  data  analyt¬ 
ics.  To  alleviate  these  difficulties,  this  paper  proposes  a  novel  dissimilarity  measure 
for  covariances,  the  Jensen-Bregman  LogDet  Divergence  (JBLD).  This  divergence 
enjoys  several  desirable  theoretical  properties,  at  the  same  time  is  computation¬ 
ally  less  demanding  (compared  to  standard  measures).  To  address  the  problem 
of  efficient  nearest  neighbor  retrieval  on  large  covariance  datasets,  we  propose  a 
metric  tree  framework  using  kmeans  clustering  on  JBLD.  We  demonstrate  the  su¬ 
perior  performance  of  JBLD  on  covariance  datasets  from  several  computer  vision 
applications. 


1  Introduction 

Recent  times  have  witnessed  a  steep  increase  in  the  utilization  of  structured  data  in 
several  computer  vision  and  machine  learning  applications,  where  instead  of  vectors, 
one  uses  richer  representations  of  data  such  as  graphs,  strings,  or  matrices.  A  class  of 
such  structured  data  that  has  been  gaining  importance  in  computer  vision  is  the  class  of 
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Symmetric  Positive  Definite  (SPD)  matrices,  specifically  as  covariance  matrices.  These 
matrices  which  offer  a  compact  fusion  of  multiple  features,  they  are  by  now  preferred 
data  representations  in  several  applications. 

A  covariance  descriptor  is  nothing  but  the  covariance  matrix  of  features  from  an 
image  region.  Mathematically, 

Definition  1.  Let  Fi  £  Rp,  for  i  =  1,  2,  •  •  •  ,  N,  be  the  feature  vectors  from  the  region 
of  interest  of  an  image,  then  the  Covariance  Descriptor  of  this  region  C  £  <S5.+  is 
defined  as: 

1  N 

c=  N  1  iFi  -  -  pF)T  (1) 

i= 1 

where  pp  =  [N  \  Fp  >s  ^ le  mean  feature  vector  and  5P  +  is  the  space  of  p  X  p 
Symmetric  Positive  Definite  (SPD)  matrices. 

To  bring  out  the  importance  of  covariance  matrices  in  computer  vision,  we  con¬ 
cisely  review  a  few  applications  in  which  these  data  descriptors  have  found  immense 
success.  SPD  matrices  are  fundamental  objects  in  Diffusion  Tensor  Imaging  for  map¬ 
ping  biological  tissue  structures,  with  applications  to  the  diagnosis  of  neuro-psychiatric 
illnesses  including  Alzheimer’s  disease,  brain  atrophy,  and  dementia  [1-3].  Covari¬ 
ances  provide  a  convenient  platform  for  fusing  multiple  features,  are  robust  to  static 
noise,  and  can  be  easily  made  invariant  to  image  affine  transformations,  illumination 
changes  or  changes  in  camera  parameters.  As  a  result  they  are  used  aplenty  in  multi¬ 
camera  object  tracking  applications  [4,5].  Other  important  applications  of  covariances 
include  but  not  limited  to  human  detection  [6],  image  segmentation  [7],  texture  seg¬ 
mentation  [8],  robotics  and  autonomous  vehicle  navigation  [9],  robust  face  recogni¬ 
tion  [10],  emotion  recognition  [11],  structure  tensor  for  background  subtraction  appli¬ 
cations  [12],  and  human  action  recognition  [13].  Application  of  covariances  as  data 
descriptors  is  not  limited  to  computer  vision;  examples  are  speech  recognition  [14], 
and  acoustic  compression  [15]. 

However,  these  successful  applications  are  burdened  by  a  common  problem:  when¬ 
ever  distance  or  similarity  computations  with  covariances  are  required,  the  correspond¬ 
ing  algorithms  tend  to  slow  down.  This  is  because,  covariances  do  not  conform  to  the 
Euclidean  geometry,  but  rather  form  a  Riemannian  manifold.  Data  points  on  this  man¬ 
ifold  are  no  more  connected  by  straight  lines,  but  rather  geodesics  along  the  curvature 
of  the  manifold.  As  a  result,  computing  similarity  between  covariance  matrices  is  non¬ 
trivial.  But  the  choice  of  similarity  measure  is  crucial,  especially  for  a  fundamental  task 
such  as  the  Nearest  Neighbor  (NN)  retrieval  which  forms  the  building  block  for  many 
applications.  For  example,  for  tracking  the  appearance  of  people  in  video  surveillance, 
the  number  of  database  points  can  lie  in  the  millions,  and  without  efficient  similar¬ 
ity  computation,  NN  retrieval  and  the  subsequent  tracking  are  severely  disadvantaged. 
Standard  NN  retrieval  techniques  such  as  locality  sensitive  hashing  [16]  cannot  be  di¬ 
rectly  applied  to  covariance  datasets  without  ignoring  the  manifold  structure,  resulting 
in  poor  retrieval  accuracy.  Driven  by  these  concerns,  we  take  a  closer  look  at  similar¬ 
ity  computation  for  covariance  matrices,  for  which  we  introduce  the  Jensen-Bregman 
LogDet  Divergence  (JBLD).  We  discuss  theoretical  properties  of  JBLD  and  then  apply 
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it  to  the  task  of  rapid  NN  retrieval  on  several  image  databases.  Experiments  against 
state-of-the-art  techniques  show  the  advantages  afforded  by  JBLD. 

This  paper  is  organized  as  follows.  We  start  with  a  review  of  several  similarity 
metrics  on  covariance  matrices  in  Section  2.  This  is  followed  by  an  introduction  to  the 
JBLD  measure,  and  exposition  of  its  properties  in  Section  3.  Section  4  discusses  the 
application  of  JBLD  for  nearest  neighbor  retrieval  on  covariances.  Towards  this  end, 
we  propose  a  kmeans  clustering  algorithm  using  JBLD  in  Section  4.1.  Experiments 
and  results  are  presented  in  Section  5  followed  by  conclusion  in  Section  6. 

Before  we  proceed  with  the  paper,  we  briefly  describe  our  notation.  We  refer  to  the 
d  x  d  space  of  Symmetric  Positive  Definite  (SPD)  matrices  as  S'1  , .  At  places  where 
the  dimensionality  of  the  matrix  is  unimportant,  an  SPD  matrix  X  might  be  introduced 
as  X  >  0.  The  notation  Sd  represents  the  space  of  d  x  d  symmetric  matrices.  We  use  |  | 
to  denote  matrix  determinant,  Tr  denotes  the  trace  and  ||  |  p  for  the  matrix  Frobenius 
norm.  Also,  X  refers  to  a  d  x  d  identity  matrix. 

2  Related  Work 

We  recall  some  standard  similarity  measures  for  covariance  matrices.  The  simplest  but 
naive  approach  is  to  view  d  x  d  covariance  matrices  as  vectors  in  Rd(d+1)/2,  whereby 
the  standard  (dis)similarity  measures  of  Euclidean  space  can  be  used  (e.g.,  £;, -distance 
functions,  etc.).  Recall  that  covariance  matrices,  due  to  their  positive  definiteness  struc¬ 
ture,  belong  to  a  special  category  of  symmetric  matrices  and  form  a  Riemannian  mani¬ 
fold  (which  is  a  differentiable  manifold  associated  with  a  suitable  Riemannian  metric). 
Euclidean  distances  on  vectorized  covariances  ignore  this  manifold  structure  leading 
to  poor  accuracy  [17,  18].  In  addition,  under  this  measure  symmetric  matrices  with 
non-positive  eigenvalues  are  at  finite  distances  to  positive  definite  covariances.  This  is 
unacceptable  for  a  variety  of  applications,  e.g.  DT-MRI  [  17], 

A  more  suitable  choice  is  to  incorporate  the  curvature  of  the  Riemannian  manifold 
and  use  the  corresponding  geodesic  length  along  the  manifold  surface  as  the  distance 
metric.  This  leads  to  the  Affine  Invariant  Riemannian  Metric  (AIRM)  [  19, 20]  which  is 
defined  as  follows:  For  X,Y  in 

Dr[X,  Y)  :=  ||log(X-1/2YX-1/2)||F,  (2) 

where  log(-)  is  the  principal  matrix  logarithm.  This  metric  enjoys  several  useful  the¬ 
oretical  properties,  and  is  perhaps  the  most  widely  used  similarity  measure  for  covari¬ 
ance  matrices.  As  is  clear  from  (2),  symmetric  matrices  with  nonpositive  eigenvalues 
are  at  infinite  distances.  The  metric  is  invariant  to  inversion  and  similarity  transforms. 
Other  properties  of  this  metric  can  be  found  in  [19].  Computationally,  this  metric  can 
be  unattractive  as  it  requires  eigenvalue  computations  or  sometimes  even  matrix  loga¬ 
rithms,  which  for  larger  matrices  cause  significant  slowdowns.  A  few  examples  of  such 
applications  using  large  covariances  are:  face  recognition  [10]  (40  x  40),  and  emotion 
recognition  [11]  (30  x  30). 

Amongst  the  many  measures  that  have  been  proposed  to  replace  AIRM,  a  closely 
related  one  is  the  Log-Euclidean  Riemannian  Metric  (LERM).  Considering  the  log- 
Euclidean  mapping  log  :  — >  Sd,  Arsigny  et  al.  [17]  observed  that  under  this 
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mapping,  the  Lie  group  of  SPD  matrices  is  isomorphic  and  diffeomorphic  (smooth 
manifolds  are  mapped  to  smooth  manifolds)  to  the  space  of  symmetric  matrices.  That 
is,  the  log  is  a  bijection.  Using  this  mapping,  the  paper  introduces  LERM  as: 

Die(X,  Y )  :=  ||log(X)  -  log(F)||P.  (3) 

On  the  positive  side,  LERM  maps  SPD  matrices  to  a  flat  Riemannian  space  (of  null 
curvature)  so  that  the  ordinary  Euclidean  distances  can  be  used.  The  metric  is  easy  to 
compute,  and  preserves  a  few  important  properties  of  the  AIRM  (such  as  invariance 
to  inversion  and  similarity  transforms).  In  addition,  from  a  practical  point  of  view, 
since  this  metric  untangles  the  two  constituent  matrices  from  their  generalized  eigen¬ 
values,  the  logarithms  on  each  of  these  matrices  can  be  evaluated  offline,  gaining  a 
computational  edge  over  AIRM.  As  a  result,  LERM  has  found  many  applications  in 
visual  tracking  [21],  stereo  matching  [22],  etc.  On  the  negative  side,  computing  ma¬ 
trix  logarithms  can  dramatically  increase  the  computational  costs.  The  flattening  of 
the  manifold  as  in  LERM  often  leads  to  less  accurate  distance  computations,  affecting 
application  performance.  A  more  important  problem  that  one  encounters  when  using 
LERM  is  that  its  moments  (gradients,  Hessian,  etc.)  do  not  have  closed  forms.  More¬ 
over,  it  is  computationally  difficult  even  to  approximate  these  moments  due  to  the  need 
to  find  derivatives  of  matrix  logarithms.  The  following  proposition  shows  that  LERM 
is  a  lower  bound  to  AIRM.  This  result  will  come  useful  later  in  this  paper. 

Proposition  1.  For  X,  Y  €  <S'|+,  we  have:  l)[e(X ,  Y)  <  Dr.  Further,  the  equality 
holds  only  when  X  and  Y  commute. 

Proof.  Since  X,  Y  are  positive  matrices,  we  can  write  them  in  the  exponential  form 
as  X  =  ex  and  Y  =  c)  respectively,  where  X  and  V  are  symmetric  matrices.  Now, 
recalling  that  the  Riemannian  metric  Dr  is  affine  invariant,  we  can  rewrite  (2)  in  the 
following  equivalent  form: 

D\  =  Tr  ((log2(eYe“'Y)))  (4) 

Invoking  the  Golden-Thompson  inequality  [23]  and  the  monotonicity  of  the  log  func¬ 
tion,  we  have  the  following  inequality  from  (4), 

D\  =  Tr  (log2  (eYe~x))  >  Tr  (log2  (eY~x))  =  D?e. 

□ 

Similar  to  our  approach,  there  have  been  previous  attempts  to  use  symmetrized  /- 
divergences  from  information  theory  into  developing  distances  on  SPD  matrices.  One 
such  idea  is  to  view  the  SPD  matrices  as  being  the  covariances  associated  with  zero- 
mean  Gaussian  distributions  [18],  and  then  use  the  symmetrized  KL-Divergence  Metric 
(KLDM)  as  the  distance  between  the  distributions.  This  leads  to  the  following  defini¬ 
tion  of  KLDM: 

Dh(X,Y):=^Tr(X-1Y  +  Y-1X-2l)  (5) 

This  measure  does  not  require  matrix  eigenvalue  computations,  or  logarithms,  and  at 
the  same  time  enjoys  many  of  the  properties  of  AIRM.  On  the  negative  side,  the  mea¬ 
sure  requires  inversion  of  the  constituent  covariances,  which  can  be  slow  (or  can  even 
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lead  to  instability  when  the  data  matrices  are  poorly  conditioned).  A  bigger  concern 
being  that  KLDM  can  in  fact  overestimate  the  Riemannian  metric  as  the  following 
proposition  shows  and  thus  can  lead  to  poor  accuracy. 

Proposition  2.  There  exist  X  and  Y  <E  Si,  such  that  Du  >  Dr. 

Proof.  Let  Vi  be  the  ith  eigenvalue  of  X~1Y.  Since  is  always  positive,  we  can  write 
Vi  =  e"'1  for  Ui  €  M.  Then  from  the  definitions  of  KLDM  and  AIRM,  we  have: 


For  a  suitable  choice  of  u.t,  we  have  the  desired  result.  □ 

A  distance  on  the  Cholesky  factorization  of  the  SPD  matrices  is  presented  in  [24]. 
The  idea  is  as  follows:  suppose  X  =  LiLj  and  Y  =  L2  lJ2  represent  the  Cholesky 
decomposition  of  X  and  Y  respectively,  with  lower  triangular  matrices  L\  and  L2, 
then  the  Cholesky  distance  is  defined  as: 

Dc(X,Y)  =  \\L1-L2\\f.  (6) 

Other  similarity  measures  on  covariance  matrices  may  be  found  in  [25].  Albeit  their 
easy  formulations  and  properties  close  to  those  of  AIRM,  the  above  distances  based  on 
/- divergences  have  not  been  very  popular  in  SPD  matrix  based  applications  due  to  their 
poor  accuracy  (as  our  experiments  will  later  demonstrate). 

In  contrast  to  all  these  metrics,  the  similarity  metric  that  we  propose  in  this  paper 
is  much  faster  to  compute,  as  it  depends  only  on  the  determinant  of  the  input  matrices, 
and  thus  no  eigenvalue  computations  are  required.  Moreover,  as  we  will  later  see,  it 
turns  out  to  be  empirically  also  very  effective. 

We  note  that  NN  retrieval  for  covariance  matrices  itself  is  still  an  emerging  area, 
so  literature  on  it  is  scarce.  In  [26],  an  attempt  is  made  to  adapt  NN  techniques  from 
vector  spaces  to  non-Euclidean  spaces,  while  [27]  proposes  an  extension  of  the  spectral 
hashing  techniques  to  covariance  matrices.  However,  both  these  techniques  are  based 
on  a  Euclidean  embedding  of  the  Riemannian  manifold  through  the  tangent  spaces,  and 
then  using  LERM  as  an  approximation  to  the  true  similarity. 

3  Jensen-Bregman  LogDet  Divergence 

We  first  recall  some  basic  definitions  and  then  present  our  similarity  measure:  the 
Jensen-Bregman  LogDet  Divergence  (JBLD).  We  remark  that  although  this  measure 
seems  natural  and  simple,  to  our  knowledge  it  has  not  been  formally  discussed  in  detail 
before.  We  alert  the  reader  that  JBLD  should  not  be  confused  with  its  asymmetric 
cousin:  the  so-called  LogDet  divergence  [28]. 
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At  the  core  of  our  discussion  lies  the  Bregman  Divergence  :  S  x  relint(iS)  — > 

[0,  oo),  which  is  defined  as 

d</>(x,  y)  :=  (j>(x)  -  <t>{y)  -{x-y,  V0(y)) ,  (7) 

where  4>  :  S  C  — >  K  is  a  strictly  convex  function  of  Legendre-type  on  int(dom  S)  [29]. 

From  (7)  the  following  properties  of  d(p(x,  y)  are  apparent:  strict  convexity  in  x;  asym¬ 
metry;  non-negativity;  and  definiteness  (i.e.,  d $  =  0,  iff  x  =  y).  Bregman  diver¬ 
gences  enjoy  a  host  of  useful  properties  [29, 30],  but  often  their  lack  of  symmetry  and 
sometimes  their  lack  of  triangle-inequality  can  prove  to  be  hindrances.  Consequently, 
there  has  been  substantial  interest  in  considering  symmetrized  versions  such  as  Jensen- 
Bregman  divergences  [31-33],  where  assuming  s  =  (x  +  y)/ 2, 

J<t>{x,y)  :=  -(d^XyS)  +d<j,(s,y))i  (8) 

or  even  variants  that  satisfy  the  triangle  inequality  [33,34]. 

Both  (7)  and  (8)  can  be  naturally  extended  to  matrix  divergences  (over  Hermitian 
matrices)  by  composing  the  convex  function  0  with  the  eigenvalue  map  A,  and  replac¬ 
ing  the  inner-product  in  (7)  by  the  trace.  We  focus  on  a  particular  matrix  divergence, 
namely  the  Jensen-Bregman  LogDet  Divergence,  which  is  defined  for  A,  Y  in  S'j  ,  by 


Jm(X,Y)  :=  log 


X  +  Y 
2 


\log\XY\. 


(9) 


where  |  •  |  denotes  the  determinant;  this  divergence  is  obtained  from  the  matrix  version 
of  (8)  by  using  <f>(X)  =  —  log  |A'|  as  the  seed  function. 


3.1  Properties 

For  X,  Y,  Z  G  S'l-  and  invertible  matrices  A  and  B,  we  have  the  following  properties 
(see  [35]  for  details  and  proofs): 

1.  Jed(X,  Y)  >  0  (nonnegativity) 

2.  Jed(X,  Y)  =  0  iff  A  =  Y  (definiteness) 

3.  Jid{X,  Y)  =  Jid(Y,X)  (symmetry) 

4.  sj  Jed(X,Y )  <  y'Jedi X,  Z)  +  y/Jed(Z,Y)  (triangle  inequality;  see  [35]) 

5.  Jgd(AX  B ,  AY  B)  =  J{d{X,Y)  (affine  invariance) 

6.  J^(A-1,  Y~l)  =  Jid{ X,  Y )  (invariance  to  inversion) 

We  would  like  to  remark  that .//,/  can  also  be  written  as  follows: 

Jid( X,  Y)  =  Tr  (log  -  \  (logXF))  (10) 

where  log  is  the  matrix  logarithm.  Although  this  construction  of  .//>/  makes  it  slightly 
computationally  expensive,  such  a  formulation  could  be  suitable  for  some  applications. 
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Theorem  3  (Non-Convexity).  Assuming  X .  Y  >0,  for  a  fixed  Y,  J(d  ( X ,  Y )  is  convex 
for  X  <  (1  +  V2)Y  and  concave  for  X  >  (1  +  \/X)Y . 

Proof.  Taking  the  second  derivative  of  Jed(X ,  Y)  with  respect  to  X,  we  have 

X2xJld{X,  Y)  =  -(X  +  y)-1  ®  (X  +  F)-1  +  x  \  (11) 

This  expression  is  positive  for  X  <  (1  4-  \[X)Y  and  negative  for  X  >  (1  +  y/2)Y.  □ 

3.2  Nearest  Isotropic  Matrix 

As  we  alluded  to  earlier,  diffusion  tensor  imaging  is  the  process  of  mapping  diffusion  of 
water  molecules  in  the  brain  tissues  and  helps  in  the  diagnosis  of  neurological  disorders 
non-invasively .  When  the  tissues  have  an  internal  fibrous  structure,  water  molecules  in 
these  tissues  will  diffuse  rapidly  in  directions  aligned  with  this  structure.  Symmetric 
positive  definite  matrices  are  important  mathematical  objects  in  this  field  useful  in  the 
analysis  of  such  diffusion  patterns  [1],  Anisotropic  index  is  a  useful  quantity  that  is 
often  used  in  this  area  [18],  which  is  the  distance  of  a  given  SPD  matrix  from  its 
Nearest  Isotropic  Matrix  (NIM).  Mathematically,  the  NIM  aX  (a  >  0)  from  a  given 
tensor  P  >  0  with  respect  to  a  distance  measure  T>(., .)  is  defined  as: 

min  V(al,P)  (12) 

a>0 

There  are  closed  form  expressions  for  a  when  V  is  AIRM,  LERM,  or  KLDM 
(see  [18]  for  details).  Unfortunately,  for  Jid  there  is  no  closed  form  for  this.  In  the  fol¬ 
lowing,  we  investigate  this  front  of  our  metric  and  propose  a  few  theoretical  properties. 

Theorem  4.  Suppose  P  £  «S+,  and  let  S  =  aX  be  such  that  J(d(P,S)  is  convex 
(see  Theorem  3).  Then  the  NIM  to  P  is  the  minimum  positive  root  of  the  following 
polynomial  equation: 

p(a)  :=dad  +  (d  —  2)  ^  Ai«d_1  +  (d  —  4)  ^  AiAjad~2 
i  iJXAj 

+  •  •  •  +  (2  —  d)  'y  '  1 1  A jcx  —  d  1 1  Aj  =  0,  (13) 

*  tAi  * 

where  Aj,  i  =  1,  2,  •  •  •  ,  d  are  the  eigenvalues  of  P. 

Proof.  Using  the  definition  of  J^d  in  (12),  and  applying  the  assumption  that  Jgd  is 
convex,  at  optimality  we  have  dJed^Xp)  _  q  jeacjs  to: 

12"  1 
a  d  ^  a  +  Aj 

Rearranging  the  terms,  we  have  the  polynomial  equation  in  13.  Since  the  coefficient 
of  of-1  is  always  positive  (for  d  >  2),  there  must  always  exist  at  least  one  positive 
root.  □ 
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Corollary  5.  When  d  =  2,  we  have  a  =  \f\P\,  which  is  the  same  as  NIM  for  the 
Riemannian  distance. 


Since  in  DT-MRI,  generally  3x3  SPD  matrices  are  used,  we  show  this  case  next. 
Lemma  6.  Let  P  £  S++  and  suppose  ||P||2  <  1,  then 


1  +  Tr  (P)  /d 
1  +  Tr  (P^1)  /d 


>|P|. 


(14) 


Proof.  Suppose  P  £  S++  and  || P||2  <  1,  then  Tr(P)  <  d.  Suppose  A i,i  = 
1,2 ,  •  •  i  ,d  represents  the  eigenvalues  of  P,  we  have  the  following  to  prove  from  the 
lemma: 


d  +  Tr  (P) 

d\p\ + nj¥i 


(15) 


Since  |P|  <  Tr(P)/d  (due  to  AM-GM  inequality)  and  since  JA  Ylj^i  -AAj  <  d,  we 
have  the  desired  result.  □ 


Theorem  7.  Let  P  £  S++,  and  if  S  =  ctT,  a  >  0  is  the  NIM  to  P,  then  a  £  (0, 1). 

Proof.  Substituting  d  =  3  in  (13),  we  have  the  following  third  degree  polynomial 
equation: 

p(a)  :=  3a3  +  Tr(P)ct2  -  |P|  T^P"1^  -  3  |P|  =  0  (16) 

Analyzing  the  coefficients  of  p(a)  shows  that  only  one  root  is  positive.  Now,  we  have 
p(0)  <  0.  Applying  Lemma  6,  we  have  p(  1)  >  0,  which  concludes  that  the  smallest 
positive  root  lies  in  (0, 1).  □ 


3.3  Connections  to  Other  Metrics 

We  summarize  below  some  of  the  interesting  connections  J(d  has  with  the  standard 
metrics  on  covariances. 

Theorem  8  (Relations). 

(*)  J(d  <  Dr 

( a )  did  <  D2kl 


Proof.  Let  Vi  =  A, (XL-1).  Since  X,Y  £  Sf,,  the  eigenvalues  V{  are  also  positive, 
whereby  we  can  write  each  Vi  =  e"‘  for  some  ut  £  K.  Using  this  notation,  the  AIRM 
may  be  rewritten  as  Dr{X ,  Y)  =  ||u||2,  and  the  JBLD  as 

Ju(X,Y)  =  y‘l  (log(l  +  e“‘)  —  Ui/2  —  log  2),  (17) 

Z - '2—1 

where  the  equation  follows  by  observing  that  Jed(X,  Y)  =  log  \I+XY~l \  —  |  log  \XY_1  \ 
log  2d. 


To  prove  inequality  (i),  consider  the  function  f(u)  =  u2  —  log(l+e“)+u/2+log  2. 
This  function  is  convex  since  its  second  derivative 


/»  =  2  - 


(1  +  e“)2  ’ 


is  clearly  nonnegative.  Moreover,  /  attains  its  minimum  at  u*  =  0,  as  is  immediately 
seen  by  solving  the  optimality  condition  f'(u)  =  2 u  —  eu/(l  +  e“)  +  1/2  =  0.  Thus, 
f(u)  >  f(u*)  =  0  for  all  u  £  R,  which  in  turn  implies  that 


Vd  f(Ui)  =  D2r(X,  Y)  -  Jed(X,  Y )  >  0.  (18) 

z - 'l  =  l 

Similarly  to  prove  inequality  (ii),  consider  the  function  g(u)  =  D2t  —  which 
expands  to: 

11  ?/ 

9(u)  =  -(e“  +  — )-log(l  +  e“)  +  -+log2-l  (19) 

Going  by  the  same  steps  as  before,  it  is  straight-forward  to  show  that  g(u)  is  convex 
and  attains  its  minimum  when  u  =  0,  proving  the  inequality.  □ 

Theorem  9  (upper  bound).  If  0  +  ml  <  X,Y  <  MI,  then 

D2r(X,Y)  <  2log(M/m)(Jed(X,Y)  +7),  (20) 


where  7  =  d  log  2. 

Proof.  Observe  that 

1(log(1  +  eUi)-ui/2-log2)  >^.  (I'M* |/2  —  log 2), 

which  implies  the  bound 

J(d(X,  Y)  +  dlog  2  >  l||u||i.  (21) 

Since  uTu  <  ||ti||oo||w||i  (Holder’s  inequality),  using  (21)  we  immediately  obtain  the 
bound 

D2r(X,Y)  =  ||u|||  <  2||u||00(Jw  +  7),  (22) 

where  7  =  dlog  2.  But  ml  +  X,Y  +  MI  implies  that  Hu-Hoo  <  log  (M/m),  which 
concludes  the  proof.  □ 

Our  next  result  touches  upon  a  condition  when  Jgd  <  Dfe .  A  more  general  treat¬ 
ment  of  this  relationship  is  outside  the  scope  of  this  paper,  mainly  because  the  gradient 
and  the  Hessian  of  Die  do  not  have  closed  forms. 

Theorem  10.  If  X,  Y  £  S++  commute,  then  Jgd  <  D2e. 

Proof.  We  use  the  fact  that  when  X,  Y  commute,  D[e(X,Y)  =  DR(X,Y )  (See 
Proposition  1).  Now,  using  the  connection  between  AIRM  and  1BLD  (refer  Theo¬ 
rem  8),  we  have  the  result.  □ 
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3.4  JBLD  Geometry 

In  Figure  1,  we  plot  the  three  dimensional  balls  (isosurfaces)  associated  with  JBLD 
for  various  radii  (0.1,  0.5  and  I)  and  centered  at  the  identity  tensor.  We  also  compare 
the  JBLD  ball  with  the  isosurfaces  of  Frobenius  distance,  AIRM  and  KLDM.  As  is 
expected  Frobenius  distance  is  isotropic  and  thus  its  balls  are  spherical,  while  AIRM 
and  ICLDM  induce  convex  balls.  Against  these  plots,  and  as  was  pointed  by  Theorem  3, 
the  isosurfaces  of  JBLD  are  convex  in  some  range  while  become  concave  as  the  radius 
goes  large. 


2.5  25 


Figure  1:  Isosurface  plots  for  various  distance  measures.  First,  distances  for  arbitrary 
three  dimensional  covariances  from  the  identity  matrix  are  computed,  and  later  isosur¬ 
faces  corresponding  to  fixed  distances  of  0.1,  0.5  and  1  are  plotted.  The  plots  show  the 
surfaces  for:  (from  left)  Frobenius  distance,  AIRM,  KLDM,  and  JBLD  respectively. 


3.5  Computational  Advantages 

The  greatest  advantage  of  Jed  against  the  Riemannian  metric  is  its  computational  speed: 
Jed  requires  only  computation  of  determinants,  which  can  be  done  rapidly  via  3  Cholesky 
factorizations  (for  X  +  Y,  X  and  Y),  each  at  a  cost  of  ( 1  /'•’>) d’1  flops  [36].  Computing 
Dr  on  the  other  hand  requires  generalized  eigenvalues,  which  can  be  done  for  positive- 
definite  matrices  in  approximately  id’  flops.  Thus,  in  general  Jt,i  is  much  faster  (see 
also  Table  1).  The  computational  advantages  of  Jed  are  much  more  impressive  when 
comparing  evaluation  of  gradients1.  Table  2  shows  that  computing  VJ^  can  be  even 
more  than  100  times  faster  than  V Dr.  This  speed  proves  critical  for  NN  retrieval,  or 
more  generally  when  using  any  algorithm  that  depends  on  gradients  of  the  similarity 
measure,  e.g.,  see  [37]  and  the  references  therein.  Table  3  provides  a  summary  of  the 
various  metrics,  their  gradients  and  computational  complexities. 

4  Fast  Nearest  Neighbor  Retrieval  using  JBLD 

Now  we  turn  to  the  key  application  that  originally  motivated  us  to  investigate  Jed'- 
Nearest  Neighbor  (NN)  retrieval  for  covariance  matrices.  Here,  we  have  a  dataset 
{Si, . . . ,  Sn}  of  d  x  d  covariance  matrices  that  we  must  organize  into  a  data  structure 

1  From  a  technical  point,  Jed.  computation  for  matrices  over  d  =  1 3  was  seen  faster  when  the  determinants 
were  computed  using  the  Cholesky  decomposition. 
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d 

Dr 

Jed 

5 

.025  ±.012 

.030  ±  .007 

10 

.036  ±  .005 

.040  ±  .009 

15 

.061  ±  .002 

.050  ±  .004 

20 

.085  ±  .006 

.061  ±  .009 

40 

.270  ±  .332 

.123  ±.012 

80 

1.23  ±  .055 

.393  ±  .050 

200 

8.198  ±  .129 

2.223  ±  .169 

500 

77.311  ±  .568 

22.186  ±  1.223 

1000 

492.743  ±  15.519 

119.709  ±  1.416 

Table  1:  Average  times  (millisecs/trial)  to  compute  function  values;  computed  over 
10,000  trials  to  reduce  variance. 


d 

VxD%(X,Y) 

VxJtd(X,Y) 

5 

0.798  ±  .093 

.036  ±  .009 

10 

2.383  ±  .209 

.058  ±  .021 

20 

7.493  ±  .595 

.110  ±  .013 

40 

24.899  ±  1.126 

.270  ±  .047 

80 

99.486  ±  5.181 

.921  ±  .028 

200 

698.873  ±  39.602 

8.767  ±2.137 

500 

6377.742  ±  379.173 

94.837  ±  1.195 

1000 

40443.059  ±  2827.048 

622.289  ±  37.728 

Table  2:  Average  times  (millisecs/trial)  to  compute  gradients;  computed  over  1000 
trials  to  reduce  variance. 


to  facilitate  rapid  NN  retrieval.  Towards  this  end,  we  chose  to  use  the  metric  tree 
data  structure  as  we  wanted  to  show  the  performance  on  an  exact  NN  algorithm  for 
covariances  and  for  which  approximations  can  be  easily  effected  for  faster  searches.  A 
key  component  of  the  metric  tree  is  a  procedure  to  partition  the  data  space  into  mutually 
exclusive  clusters,  so  that  heuristics  such  as  branch  and  bound  can  be  applied  to  prune 
clusters  that  are  unlikely  to  occupy  candidate  neighbors  to  a  query.  To  this  end,  we 
derive  below  a  kmeans  algorithm  on  Ji(i  which  will  later  be  used  to  build  the  metric 
tree  on  covariances. 


metric 

D2(X,Y) 

FLOPS 

GradientCVx) 

AIRM 

||iog(x-1/2rx-1/2)||2 

4  d3 

2X~1\og(XY~1) 

LERM 

lllogpQ  -logQOIIg 

2X-1(logA-log  Y) 

KLDM 

~pl 

U 

Y-1  -  X~1YX~1 

JBLD 

log 

d 3 

{X  +  Y)-1  -  IX"1 

Table  3:  A  comparison  of  various  metrics  on  covariances  and  their  computational  com¬ 
plexities  against  Jf(h 
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4.1  K-Means  with  Jpd 


In  this  section,  we  derive  a  K-Means  clustering  algorithm  based  on  J(d .  Let  Si ,  S2 ,  •  •  •  ,  Sn 
be  the  input  covariances  that  we  need  to  be  clustered.  A  standard  K-Means  algorithm 
gives  rise  to  the  following  optimization  problem: 

K 

min  V^4(45)i  (23) 

Cl’C2’-’CKk=iSeck 

where  Xk  is  the  centroid  of  cluster  Ck .  Following  the  traditional  K-Means  algorithm, 
we  can  alternate  between  the  centroid  computation  and  the  clustering  stages  to  mini¬ 
mize  (23).  The  only  significant  step  then  amounting  to  the  computation  of  the  centroid 
for  the  fcth  cluster,  which  can  be  written  as: 

(24) 

\log\XkS\  (25) 

Unfortunately,  as  we  saw  earlier,  Jpd  is  neither  a  Bregman  divergence,  nor  is  it  convex 
and  thus  we  cannot  use  the  traditional  centroid  computation.  The  good  news  is  that,  we 
can  write  (25)  as  the  sum  of  a  convex  function  Fvex(Xk,  S)  =  —  1°§  I  Ys| 

and  a  concave  term  Fcave(Xk,  S )  =  YlseCk  1°S  I  Xk%S  I •  Such  a  combination  of  con¬ 
vex  and  concave  objectives  can  be  efficiently  solved  using  Majorization-Minimization 
through  the  Convex-ConCave  Procedure  (CCCP)  [38].  The  main  idea  of  this  procedure 
is  to  approximate  the  concave  part  of  the  objective  by  its  first  order  Taylor  approxima¬ 
tion  around  the  current  best  estimate  X that  is,  for  the  (t  +  l)st  step: 

Xl+l  =  argmin  Fvex (Xk ,  S)  -  Xj V* „  Fcave (X* ,  S ) .  (26) 

xk 

Substituting  (26)  in  (25),  later  taking  the  gradient  of  (25)  with  respect  to  Xk  and  setting 
it  to  zero  (recall  that  now  we  have  a  convex  approximation  to  (25)),  we  have: 

E  VXkFvex(Xl+1,  S)  =  -J2  VXkFcave(Xl  S ).  (27) 

seck  seck 


F  :=  min  Y]  Jid(Xk,S) 
Xk  seck 

■  i  1  xk  +  s 
:=mm  ^  log  |  ^  I 

S£Ck 


Expanding  the  gradient  terms  for  .//,/,  we  have  the  following  fixed-point  iteration: 


^+1 


JL  v  (s±xi\ 

1^1  shS  2  ) 


(28) 


Convergence  of  the  CCCP  procedure  is  tied  to  the  compactness  of  the  solution  space. 
Unfortunately,  the  space  of  SPD  matrices  is  of  non-compact  type  [39]  with  a  non¬ 
positive  sectional  curvature;  the  latter  property  implying  that  the  barycenter  of  a  set  of 
covariances  in  the  respective  Riemannian  manifold  need  not  be  unique  [40].  Thus,  in 
the  following  we  investigate  the  convergence  of  the  fixed  point  iteration  in  (28). 
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Lemma  11.  The  function  /(X)  =  X  1  for  X  £  S++  is  matrix  convex,  i.e.,  for 
X,Y  £  Sf+  and  for  t  £  [0,1], 

f(tX  +  (1  -  t)Y)  <  tf(X)  +  (1  -  t)f(Y).  (29) 


Proof.  See  Exercise  V.1.15  [23]  for  details. 

Lemma  12.  If  X,  Y  £  Sr_j_  +  and  suppose  X  >Y,  then  X-1  <  Y~x. 
Proof.  See  Corollary  7.7.4  [41], 


□ 


□ 


Theorem  13.  Let  S\ .  S-> ,  ■  ■  ■  ,  Sn  he  the  input  covariances  and  let  X*  be  the  centroid 
returned  found  by  (28).  Then  X*  lies  in  the  compact  interval 


1 


1 


Es-1  <**<-Es- 


(30) 


i=l 


i= 1 


Proof.  Proving  the  left  inequality:  Applying  Lemma  1 1  to  (28),  we  have: 

(31) 

(32) 

Now,  applying  Lemma  12,  the  result  follows. 

Proving  the  right  inequality:  As  one  can  see,  the  right  side  of  (28)  is  essentially 
the  harmonic  mean  of  —  l/s''  for  i  =  1,  2,  •  •  •  ,  n.  Using  the  fundamental  inequality 
that  harmonic  mean  is  always  less  than  or  equal  to  the  arithmetic  mean,  we  have  the 
result.  □ 

Theorem  14.  Let  {X4}  ( for  t  >  1)  be  the  sequence  of  successive  iterates  generated  as 
per  (28).  Then,  X4  — »  X*,  where  X*  is  a  stationary  point  of  (25). 

Proof.  It  is  clear  that  Fvex  and  —Fcave  are  strictly  convex  functions  and  —XFcave  is 
continuous.  Further,  from  Theorem  13  it  is  clear  that  the  solution  lies  in  a  compact 
interval  inside  ,  .  Thus,  following  the  conditions  of  convergence  stipulated  in  [42] 
(CCCP-II,  Theorem  8),  the  iterations  in  (28)  converges  for  a  suitable  initialization  in¬ 
side  the  compact  set.  □ 


x-^-V 

n  ^ 


S-1  +  x 


-1 


i=l 
n  c_i 


<IV^E  +  ix-1. 

—  n  9  9 


n  * — '  2 

i—l 


4.2  NN  Using  Metric  Tree 

As  we  mentioned  earlier,  we  decided  to  use  a  metric  tree  for  the  task  of  efficient  NN 
retrieval  on  covariance  datasets.  Metric  Trees  (MT)  [43]  are  one  of  the  fundamental  tree 
based  algorithms  for  fast  NN  retrieval  useful  when  the  underlying  similarity  measure  is 
a  metric.  NN  using  the  MT  involves  two  steps:  (i)  Building  the  tree,  and  (ii)  Querying 
the  tree.  We  discuss  each  of  these  steps  below. 
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4.2.1  Building  MT 


To  build  the  MT,  we  perform  top-down  partitioning  of  the  input  space  by  recursively 
applying  the  JBLD  K-Means  algorithm  (introduced  above).  Each  partition  of  the  MT 
is  identified  by  a  centroid  and  the  ball  radius.  For  n  data  points,  and  assuming  we  bi¬ 
partition  each  cluster  recursively,  the  total  build  time  of  the  tree  is  0(n  log  n)  (ignoring 
the  cost  for  kmeans  itself).  To  save  time,  we  stop  partitioning  a  cluster  when  the  number 
of  points  in  it  goes  below  a  certain  threshold;  this  threshold  is  selected  as  a  balance 
between  the  computational  time  to  do  exhaustive  search  on  the  cluster  elements  against 
doing  k-means  on  it. 

4.2.2  Querying  using  MT 

Given  a  query  point  q,  one  first  performs  a  greedy  binary  search  for  the  NN  along 
the  most  proximal  centroids  at  each  level.  Once  a  leaf  partition  is  reached,  exhaustive 
search  is  used  to  localize  to  the  candidate  centroid  Xc.  Then  one  backtracks  to  check  if 
any  of  the  sibling  nodes  (that  were  temporarily  ignored  in  the  greedy  search)  contain  a 
data  point  that  is  closer  to  q  than  Xc.  To  this  end,  we  solve  the  following  optimization 
problem  on  each  of  the  sibling  centroids  C : 

V(Xclq)>  min  V(X,q)  (33) 

X,d,(X,C)=R 

where  X  is  called  the  projection  of  q  onto  the  ball  with  centroid  C,  radius  R  and  V  is 
some  distance  function.  If  (33)  is  satisfied,  then  the  sibling  node  should  be  explored, 
otherwise  it  can  be  pruned.  When  I?  is  a  metric,  (33)  has  a  simple  solution  utilizing 
the  triangle  inequality  as  is  described  in  [44].  The  mechanism  can  be  extended  to 
retrieve  k-NN  by  repeating  the  search  ignoring  the  (k-1)  NNs  already  retrieved.  This 
can  be  efficiently  implemented  by  maintaining  a  priority  queue  of  potential  sub-trees 
centroids  and  worst  case  distances  of  the  query  to  any  candidate  node  in  this  sub-tree, 
as  described  in  [43]. 

5  Experiments 

We  are  now  ready  to  describe  our  experimental  setup  and  results  to  substantiate  the 
effectiveness  of  .//>/.  We  first  discuss  the  performance  metric  on  which  our  experiments 
are  based,  later  providing  simulation  results  exposing  various  aspects  of  our  metric, 
followed  by  the  results  on  four  real-world  datasets.  All  algorithms  were  implemented 
in  MATLAB  and  tested  on  a  machine  with  3GHz  single  core  CPU  and  4GB  RAM. 

5.1  Performance  Metric 

Accuracy@K:  Suppose  we  have  a  covariance  dataset  V  and  a  query  set  Q.  Accu¬ 
racy  @K  describes  the  average  accuracy  when  retrieving  K  nearest  covariances  from 
V  for  each  item  in  Q.  Suppose  (J^  stands  for  the  ground  truth  label  subset  associated 
with  the  f/th  query,  and  if  Mjf  denotes  the  label  subset  associated  with  the  K  nearest 
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covariances  found  using  a  metric  M  for  the  query  q,  then  we  formally  define: 

1  iGfnAffl 

Accuracy@K  =  —  ^ -  (34) 

M  <JGQ  1 

Note  that  Accuracy® K  as  defined  in  (34)  subsumes  the  standard  performance  metrics: 
precision  and  recall.  Most  often  we  work  with  K  -  1,  in  which  case  we  will  drop  the 
suffix  and  will  refer  as  Accuracy.  Since  some  of  the  datasets  used  in  our  experiments 
do  not  have  ground  truth  data  available,  the  baselines  for  comparison  were  decided  via 
a  linear  scan  using  the  AIRM  metric  as  this  metric  is  deemed  the  state-of-the-art  on 
covariance  data. 

5.2  Simulations 

Before  we  delve  into  the  details  of  our  experiments,  we  highlight  here  the  base  ex¬ 
perimental  configurations  that  we  used  for  all  the  simulation  experiments.  Since  there 
are  a  variety  of  code  optimizations  and  offline  computations  possible  for  the  various 
metrics,  we  decided  to  test  all  the  algorithms  with  the  base  implementation  as  provided 
by  MATLAB.  An  exception  here  are  the  experiments  using  LERM.  It  was  found  that 
computing  LERM  projecting  the  input  matrices  into  the  log-Euclidean  space  (through 
matrix  logarithms)  resulted  in  expensive  computations,  as  a  result  of  which  the  perfor¬ 
mances  were  incomparable  with  the  setup  used  for  other  metrics.  Thus,  before  using 
this  metric,  we  took  the  logarithm  of  all  the  covariances  offline. 

For  the  NN  experiments,  we  used  a  metric  tree  with  four  branches  and  allowed  a 
maximum  of  100  data  points  at  the  leaf  nodes.  With  regard  to  computing  the  cluster 
centroids  (for  k-means),  LERM  and  FROB  metrics  used  the  ordinary  Euclidean  sample 
mean,  while  AIRM  used  the  Frechet  mean  using  the  iterative  approximation  algorithm 
described  in  [45].  The  centroid  for  KLDM  boils  down  to  computing  the  solution  of 
a  Riccati  equation  as  described  in  [46].  For  the  simulation  experiments,  we  used  the 
results  produced  by  AIRM  as  the  ground  truth. 

Now  we  are  ready  to  describe  our  base  configuration  for  the  various  simulation 
experiments.  We  used  IK  covariances  of  10D  with  50  true  number  of  clusters  as  the 
dataset  and  a  collection  of  100  covariances  as  the  query  set.  The  plots  that  we  are  about 
to  show  resulted  from  average  performances  by  repeating  the  experiments  100  times 
using  different  database  and  query  sets.  Next,  we  consider  the  various  experiments  and 
present  the  results. 

5.2.1  Accuracy  Against  Noise 

Given  that  the  metrics  on  covariances  are  nonlinear,  the  primary  goal  of  this  experiment 
is  to  validate  the  robustness  of  JBLD  against  noise  in  the  covariance  descriptors  for  the 
task  of  NN  retrieval.  This  is  especially  useful  when  considering  that  our  data  can  be 
poorly  conditioned  such  that  small  perturbations  of  a  poorly  conditioned  data  matrices 
can  lead  to  large  metric  distances,  which  for  some  applications  might  be  uncalled  for. 
Towards  this  end,  we  created  a  base  set  of  IK  covariances  from  a  set  of  simulated  fea¬ 
ture  vectors.  Subsequently,  Gaussian  noise  of  varying  magnitude  (relative  to  the  signal 
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strength)  was  added  to  the  feature  vectors  to  obtain  a  set  of  100  noisy  covariances.  The 
base  covariances  were  used  as  queries  while  the  noisy  ones  as  the  database.  A  linear 
scan  through  the  data  using  the  Riemannian  metric  to  measure  nearness  defined  the 
ground  truth.  Fig.  2  shows  the  average  accuracy  values  for  decreasing  SNR  for  three 
different  covariance  dimensions  (10D,  20D  and  40D).  It  is  clear  that  JBLD  is  more 
robust  than  LERM  and  KLDM,  at  the  same  time  yields  accuracy  almost  close  to  the 
baseline  Riemannian  metric,  irrespective  of  the  dimension  of  the  matrix.  It  is  to  be 
noted  that  a  retrieval  using  the  Frobenius  distance  (FROB)  is  clearly  seen  to  perform 
poorly.  We  would  also  like  to  highlight  that  we  noticed  a  small  drop  in  the  accuracy 
of  KLDM  (as  seen  in  Figure  2(c))  as  the  data  dimensionality  increases,  which  we  sus¬ 
pect  is  due  to  the  poor  conditioning  of  the  data  matrices  as  the  dimensionality  grows, 
impacting  the  matrix  inversions. 

5.2.2  Effect  of  Cluster  Size 

This  section  analyzes  the  scalability  of  J ^  to  an  increasing  number  of  true  data  clusters 
(given  fixed  database  size).  The  basic  goal  of  this  experiment  is  to  expose  the  clustering 
performance  of  our  J^-kmeans  algorithm  against  the  kmeans  based  on  other  metrics. 
The  performance  comparison  is  analyzed  on  three  aspects:  (i)  the  average  accuracy  of 
NN  retrieval,  (ii)  average  metric  tree  creation  time  (which  includes  kmeans  clustering 
for  each  internal  node  of  the  metric  tree),  and  (iii)  average  search  time  using  a  metric 
tree.  Figure  3  shows  results  from  this  experiment.  There  are  a  few  important  properties 
of  the  metrics  that  are  revealed  by  these  plots:  (i)  the  accuracy  of  J(,i  matches  per¬ 
fectly  with  that  of  AIRM  (note  that  AIRM  is  used  as  the  ground  truth),  (ii)  assuming 
the  metric  tree  is  constructed  optimally,  the  search  time  for  AIRM  and  .J(rj  are  com¬ 
parable,  and  (iii)  (which  is  the  most  important)  the  metric  tree  construction  for  AIRM 
almost  increases  quadratically  with  increasing  number  of  true  clusters,  while  that  for 
other  metrics  is  more  favorable.  Together,  the  three  plots  substantiate  the  superior  per¬ 
formance  of  Jid.  Later  in  this  paper,  we  will  get  back  to  illustrating  these  claims  on 
real-data. 

5.2.3  Effect  of  Matrix  Dimension 

One  of  the  major  motivations  for  proposing  .//,/  as  a  replacement  for  existing  metrics 
on  covariances  is  its  scalability  to  increasing  matrix  dimensions.  Figure  4  shows  the 
results  of  accuracy,  metric  tree  creation  time  and  search  time  using  a  metric  tree.  As  is 
clear  from  the  plots,  the  metric  tree  creation  time  increases  at  many  orders  of  magni¬ 
tude  worse  with  AIRM  than  with  other  metrics,  while  performs  better  at  accuracy 
and  retrieval  time  against  other  metrics.  Similar  to  what  we  noticed  in  Figure  2,  the 
accuracy  of  KLDM  worsens  as  the  matrix  dimension  increases. 

5.2.4  Effect  of  Increasing  Database  Size 

This  experiment  shows  the  performance  of  .//,/  against  searching  in  larger  datasets. 
Towards  this  end,  we  kept  the  number  of  true  clusters  constant  and  same  as  in  other 
experiments,  but  increased  the  number  of  data  points  (covariances)  associated  with 
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each  cluster.  The  results  of  this  experiment  in  terms  of  accuracy,  tree  buildup  time 
and  retrieval  performance  is  shown  in  Figure  5.  Similar  to  the  previous  plots,  it  is 
clear  that ,//,/  provides  promising  results  in  all  the  three  properties,  while  maintaining 
nearly  perfect  retrieval  accuracy,  showing  that  it  does  not  get  distracted  from  the  nearest 
neighbor  even  when  the  datasize  increases. 

5.3  Real  Data  Experiments 

Continuing  upon  the  simulated  performance  figures  of  Jt,i  against  other  metrics,  this 
subsection  provides  results  on  real-data.  First,  we  will  showcase  a  few  qualitative  re¬ 
sults  from  some  important  applications  of  covariances  from  literature.  We  will  demon¬ 
strate  that  JBLD  outperforms  other  metrics  in  accuracy  not  only  when  AIRM  is  as¬ 
sumed  to  be  the  ground  truth,  but  also  in  situations  when  we  know  the  correct  ground 
truth  of  data  as  provided  by  an  external  agency  or  human  labeling. 

5.3.1  Tracking  using  Integral  Images 

People  appearance  tracking  has  been  one  of  the  most  successful  applications  using  co- 
variances.  We  chose  to  experiment  with  some  of  the  popular  tracking  scenarios:  (i)  face 
tracking  under  affine  transformations,  (ii)  face  tracking  under  changes  in  pose,  and  (iii) 
vehicle  tracking.  For  (i)  and  (ii),  the  tracking  dataset  described  in  [47]  was  used,  while 
the  vehicle  tracking  video  was  taken  from  the  ViSOR  repository2.  The  images  from 
the  video  were  resized  to  244  x  320  for  speed  and  integral  images  computed  on  each 
frame.  An  input  tracking  region  was  given  at  the  beginning  of  the  video,  which  is  then 
tracked  in  subsequent  images  using  the  integral  transform,  later  computing  covariances 
from  the  features  in  this  region.  We  used  the  color  and  the  first  order  gradient  features 
for  the  covariances.  Figures  6(a), 6(b),  and  6(c)  show  qualitative  results  from  these  ex¬ 
periments.  We  compared  the  window  of  tracking  for  both  AIRM  and  JBLD,  and  found 
that  they  always  fall  at  the  same  location  in  the  video  (and  hence  not  shown). 

5.3.2  Texture  Segmentation 

Another  important  application  of  covariances  has  been  in  texture  segmentation  [4] 
which  has  further  application  in  DT-MRI,  background  subtraction  [12],  etc.  In  Fig¬ 
ure  6(e),  we  present  a  few  qualitative  results  from  segmentation  on  the  Brodatz  texture 
dataset.  Each  of  the  images  were  a  combination  of  two  different  textures,  the  objective 
being  to  find  the  boundary  and  separate  the  classes.  We  first  transformed  the  given 
texture  image  into  a  tensor  image,  where  each  pixel  was  replaced  by  a  covariance  ma¬ 
trix  computed  using  all  the  pixels  in  a  p  x  p  patch  around  the  given  pixel.  The  5x5 
covariances  were  computed  using  features  such  as  image  coordinates  of  the  pixels  in 
this  patch,  image  intensity  at  each  pixel,  and  first  order  moments.  Next,  we  applied 
the  JBLD  kmeans  algorithm  for  the  texture  mixture,  later  segregating  the  patches  using 
their  cluster  labels. 

2http://www.openvisor.org 
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Figure  2:  Accuracy  against  increasing  noise  for  various  matrix  dimensions  n\  (a)  n  =  10  x  10, 
(b)  n  =  20  x  20,  (c)  n  =  40  x  40.  It  is  assumed  that  the  AIRM  is  the  ground  truth.  MFD  stands 
for  the  Matrix  Frobenius  Distance. 
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Figure  3:  Fixed  dataset  size  of  IK,  query  size  of  100  and  for  increasing  number  of  true  clus¬ 
ters:  3(a)  accuracy  of  search,  3(b)  time  to  create  the  metric  tree,  and  3(c)  speed  of  retrieval  using 
the  metric  tree.  The  average  is  computed  over  100  trials. 


Figure  4:  Fixed  dataset  size  of  IK,  query  size  of  100  and  for  increasing  covariance  matrix 
dimensions:  4(a)  accuracy  of  search,  4(b)  time  to  create  the  metric  tree,  and  4(c)  speed  of 
retrieval  using  the  metric  tree.  The  average  is  computed  over  100  trials. 


(a)  (b)  (c) 


Figure  5:  Fixed  number  of  true  number  clusters,  query  size  of  100  and  but  increasing  the 
covariance  dataset  size:  5(a)  accuracy  of  search,  5(b)  time  to  create  the  metric  tree,  and  5(c) 
speed  of  retrieval  using  the  metric  tree.  The  average  is  computed  over  100  trials. 
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Figure  6:  Tracking  using  JBLD  on  covariances  computed  from  integral  images:  (a)  affine  face 
tracking,  (b)  tracking  face  with  pose  variations,  (c),  (d)  vehicle  tracking,  and  (e)  shows  results 
from  texture  segmentation.  The  red  rectangle  in  the  first  image  in  each  row  shows  the  object 
being  tracked.  The  yellow  rectangles  in  the  subsequent  images  are  the  nearest  objects  returned 
by  JBLD.  (e)  shows  sample  results  from  three  texture  segmentation  experiments.  The  left  image 
in  each  pair  shows  the  original  mixed  texture  image  and  the  right  image  in  each  pair  shows  the 
output  of  segmentation,  with  one  texture  masked  out. 
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5.4  Real-Data  NN  Experiments 

Now  we  are  ready  to  present  quantitative  results  on  real-world  datasets.  For  real-world 
experiments  that  are  described  in  the  subsequent  sections,  we  use  four  different  vision 
applications  for  which  covariance  descriptors  have  shown  produce  promising  results: 
(i)  texture  recognition,  (ii)  action  recognition,  (iii)  face  recognition,  and  (iv)  people 
appearance  tracking.  We  briefly  review  below  each  of  these  datasets  and  how  covari¬ 
ances  were  computed  for  each  application.  See  Figure  7  for  sample  images  from  each 
dataset. 

Texture  Dataset:  Texture  recognition  has  been  one  of  the  oldest  applications  of  co- 
variances  spanning  a  variety  of  domains,  e.g.,  DT-MRI,  satellite  imaging,  etc.  The 
texture  dataset  for  our  experiments  was  created  by  combining  the  160  texture  images 
in  the  Brodatz  dataset  and  the  60  texture  classes  in  the  CURET  dataset  [48],  Each 
texture  category  in  the  Brodatz  dataset  consisted  of  one  512  x  512  image.  To  create 
the  covariances  from  these  images,  we  followed  the  suggestions  in  [4],  First  patches  of 
size  20  x  20  were  sampled  from  random  locations  in  each  image,  later  using  the  image 
coordinate  of  each  pixel  in  a  patch,  together  with  the  image  intensity,  and  the  first  order 
gradients  to  build  5D  features.  The  covariance  matrices  computed  such  feature  vectors 
on  all  the  pixels  inside  the  patch  constituted  one  such  data  matrix  and  approximately  5K 
covariances  from  all  the  texture  images  in  all  the  categories  from  the  Brodatz  dataset. 
To  build  a  larger  dataset  for  textures,  we  combined  this  dataset  with  texture  covariances 
from  the  CURET  dataset  [48]  which  consists  of  60  texture  categories,  with  each  tex¬ 
ture  having  varying  degrees  of  illumination  and  pose  variations.  Using  the  RGB  color 
information,  together  with  the  5  features  described  before,  we  created  approximately 
27K  covariances  each  of  size  8  x  8.  To  have  covariances  of  the  same  dimensionality 
across  the  two  datasets,  we  appended  a  unit  matrix  of  small  diagonal  for  the  RGB  to 
the  covariances  computed  from  the  Brodatz  dataset. 

Action  Recognition  Dataset:  Activity  recognition  via  optical  flow  covariances  is  a 
recent  addition  to  the  family  of  applications  with  covariance  descriptors,  and  shows 
great  promise.  For  every  pair  of  frames  in  a  given  video,  the  optical  flow  is  initially 
computed;  the  flow  is  then  threshold  and  12D  feature  vectors  were  extracted  from  each 
non-zero  flow  location  (refer  [13]  for  details  on  this  feature  vector).  It  is  proposed  that 
the  covariance  computed  from  the  optical  flow  features  captures  the  profile  of  that  ac¬ 
tivity  uniquely.  To  build  the  optical  flow  covariance  dataset,  we  used  a  combination  of 
activity  videos  from  the  Weizmann  activity  dataset  [49],  the  KTFI  dataset3  and  the  UT 
tower  dataset  [50].  This  resulted  in  a  large  dataset  of  approximately  63. 5K  covariances 
each  of  dimension  12  x  12. 

Face  recognition:  Face  recognition  is  still  an  active  area  of  research  in  computer  vi¬ 
sion  and  there  has  been  many  effective  ideas  suggested.  In  [10],  the  idea  of  covariance 
descriptors  was  extended  for  recognizing  faces,  where  each  face  image  was  convolved 
with  40  Gabor  filters,  the  outputs  of  which  were  then  collated  to  form  40  x  40  covari¬ 
ances.  Although  the  covariance  descriptors  are  not  the  state-of-the-art  in  face  recogni¬ 
tion,  our  choice  of  this  application  for  this  paper  is  to  analyze  the  performance  of  our 
metric  for  real-data  of  large  dimensions.  Towards  this  end,  we  used  the  images  from 
the  Faces  in  the  Wild  dataset  [51],  which  consists  of  approximately  3 IK  face  images 

3http://www.nada.kth.se/cvap/actions/ 
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mainly  collected  from  newspapers.  We  used  the  same  approach  as  in  [10]  for  comput¬ 
ing  the  covariances,  along  with  incorporating  the  RGB  color  information  of  each  pixel 
and  the  first  and  second  order  intensity  gradients  to  form  48  x  48  covariances. 

People  Appearances:  An  important  real-time  application  of  covariances  is  people 
tracking  from  surveillance  cameras  [4].  To  analyze  the  suitability  of  our  metric  for  such 
applications,  we  illustrate  empirical  results  on  tracking  data.  For  this  experiment,  we 
used  videos  of  people  appearances  tracked  using  multiple  cameras4.  The  background 
was  first  learned  using  a  mixture  of  Gaussians,  then  the  silhouettes  of  people  in  the 
scene  were  extracted.  The  first  and  second  order  image  gradients  along  with  the  color 
information  were  used  to  obtain  approximately  10K  covariances  of  size  8x8. 
Ground  Truth:  Note  that  the  texture  dataset,  the  action  dataset  and  the  faces  dataset 
have  ground  truth  labels  associated  with  each  data  point  and  thus  for  accuracy  compar¬ 
isons,  we  directly  use  this  class  label  of  the  query  set  against  the  class  label  associated 
with  the  NN  found  by  a  metric.  Unfortunately,  the  people  appearances  dataset  does  not 
have  a  ground  truth  and  thus  we  use  the  label  of  the  NN  found  by  AIRM  as  the  ground 
truth. 


(b) 


(c) 

Figure  7:  Sample  images  from  the  various  datasets  used  in  our  real  world  data  experiments:  7(a) 
texture  images  from  the  Brodatz  dataset,  7(b)  Faces  in  the  Wild  dataset,  and  7(c)  people  appear¬ 
ance  tracking  dataset. 


5.5  NN  via  Exhaustive  Search 

Here  we  present  our  experiments  and  results  for  NN  via  exhaustive  search  using  the 
various  metrics.  Exhaustive  search  is  important  from  a  practical  point  of  view  as  most 

4http://cvlab.epfl.ch/research/body/surv/#data 
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Dataset(  size) 

AIRM 

JBLD 

LERM 

KLDM 

CHOL 

FROB 

Texture  (25852) 
Avg.  Accuracy(%) 
Avg.  Time  (s) 

85.5 

1.63 

85.5 

1.50 

82.0 

1.16(4.21) 

85.5 

1.71 

63.0 

1.81 

56.5 

1.21 

Activity  (62425) 
Avg.  Accuracy(%) 
Avg.  Time  (s) 

99.5 

4.04 

99.5 

3.71 

96.5 

2.42  (10.24) 

99.5 

4.34 

92.0 

4.98 

82.5 

2.53 

Faces  Wild(29700) 
Avg.  Accuracy(%) 
Avg.  Time  (s) 

32.5 

10.26 

33.0 

4.68 

30.5 

2.44  (24.54) 

31.5 

10.33 

29.5 

12.13 

26.5 

2.13 

Appearance  (8596) 
Avg.  Accuracy(%) 
Avg.  Time  (s) 

0.44 

100 

0.40 

83.3 

0.17  (1.7) 

70.0 

0.42 

91.0 

0.28 

52.1 

0.15 

Table  4:  Performance  of  JBLD  on  different  datasets  and  against  various  other  metrics  for  1-NN 
query  using  exhaustive  search  averaged  over  IK  queries.  Note  that  for  the  appearance  dataset, 
we  used  AIRM  as  the  baseline  (and  thus  the  accuracy  not  shown).  Avg.  time  is  in  seconds  for 
going  over  the  entire  dataset  once  to  find  the  NN.  The  time  taken  for  the  offline  log-Euclidean 
projections  is  shown  in  brackets  under  LERM. 

real-time  applications  (such  as  tracking)  cannot  spend  time  in  building  a  metric  tree. 
In  this  section,  we  analyze  the  performance  of  JBLD  in  terms  of  accuracy  and  retrieval 
speed  on  each  of  the  datasets  we  described  in  the  previous  section. 

5.5.1  Accuracy 

We  divided  each  of  the  datasets  into  database  and  query  sets,  and  then  computed  ac¬ 
curacy  against  either  the  available  ground  truth  or  the  baseline  computed  using  AIRM. 
The  query  set  typically  consisted  of  IK  covariances.  The  results  are  shown  in  Table  4. 
Clearly,  JBLD  outperforms  all  the  other  metrics  in  accuracy,  without  compromising 
much  on  the  speed  of  retrieval.  In  the  case  of  LERM,  we  had  to  vectorize  the  co- 
variances  using  the  log-Euclidean  projections  for  tractability  of  the  application.  The 
time  taken  for  this  operation  for  each  of  the  datasets  is  also  shown  in  the  table.  Since 
this  embedding  uses  the  eigen  decomposition  of  the  matrices,  this  operation  is  seen 
to  be  computationally  expensive,  deterring  the  suitability  of  LERM  for  real-time  ap¬ 
plications.  We  also  compare  the  performance  of  JBLD  against  other  distances  such 
as  the  Cholesky  (CHOL)  distance  and  the  Frobenius  (FROB)  distance.  Frobenius  dis¬ 
tance  was  seen  to  perform  poorly  in  all  our  experiments,  although  as  expected,  it  is  the 
fastest.  The  numerical  results  are  averaged  over  10  trials,  each  time  using  a  different 
database  and  a  query  set. 

5.5.2  Accuracy  @K 

We  take  the  previous  experiments  of  1-NN  a  step  further  and  present  results  on  K-NN 
retrieval  for  an  increasing  K.  The  idea  is  to  generalize  the  power  of  1-NN  to  a  K-NN 
application.  We  plot  in  Figure  8,  the  results  of  Accuracy@K,  where  the  maximum 
value  of  K  is  determined  by  the  cardinality  of  a  ground  truth  class.  The  plots  clearly 
show  that  JBLD  performs  well  against  almost  all  other  metrics  in  terms  of  accuracy  for 
increasing  K . 
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Figure  8:  Accuracy@K  plots  for  (a)  texture  dataset,  (b)  activity  dataset,  (c)  faces  dataset. 


5.6  NN  Performance  Using  Metric  Tree 

Building  the  Tree:  The  time  required  to  build  the  NN  data  structure  plays  a  critical  role 
in  the  deployment  of  a  measure.  In  Table  5,  we  show  a  comparison  of  the  build  time  of 
the  metric  tree  for  each  of  the  datasets,  with  comparisons  of  JBLD  against  AIRM.  As  is 
clear  from  the  table,  the  performance  of  AIRM  is  poor  and  worsens  with  the  increase  in 
the  matrix  dimensions  (see  the  face  dataset).  JBLD,  on  the  other  hand,  takes  far  lesser 
time  to  initialize  and  shows  consistent  performance  even  against  increasing  dataset  size 
and  matrix  dimensions. 


Dataset  (size) 

AIRM 

JBLD 

Texture  (25852) 

769.96 

131.31 

Activity  (62425) 

2985.62 

746.67 

Faces  (29700) 

13776.30 

854.33 

People  (8596) 

213.41 

53.165 

Table  5:  Comparison  of  metric  tree  buildup  times  (in  seconds)  for  the  various  datasets. 


5.7  NN  Retrieval 

5.7.1  Exact  NN  via  Metric  Tree 

Next,  we  compare  the  accuracy  and  the  speed  of  retrieval  of  JBLD  against  the  other 
metrics  using  the  metric  tree.  For  this  experiment,  we  used  a  metric  tree  with  four 
branches  at  each  internal  node  and  IK  leaf  nodes,  for  all  the  datasets.  Since  kmeans 
using  AIRM  was  found  to  take  too  much  time  until  it  converged  (it  was  found  that  with 
the  face  dataset  with  48x48  covariances  took  more  than  3  hours  with  approximately 
26K  covariances),  we  decided  to  stop  the  clustering  process  when  there  was  less  than 
10%  of  data  movements  in  the  underlying  Loyd’s  algorithm.  This  configuration  was 
forced  on  kmeans  using  other  metrics  as  well  for  fairness  of  comparison  of  the  results. 
We  show  in  Table  6  the  average  results  of  1-NN  using  the  metric  tree  with  500  queries, 
and  with  averages  computed  over  10  trials,  each  time  using  a  different  sample  set  for 
the  database  and  the  query.  As  is  clear  from  the  table,  JBLD  provides  accuracy  equal 
to  AIRM  with  at  least  1 .5  times  speedup  with  the  matrices  of  small  size,  while  more 
over  7  times  speedup  for  the  face  dataset.  The  retrieval  speed  of  LERM  and  FROB  is 
high,  while  the  accuracy  is  low.  KLDM  was  seen  to  provide  accuracy  similar  to  JBLD, 
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but  with  low  retrieval  speed.  In  short,  JBLD  seems  to  provide  the  best  mix  of  accuracy 
and  computational  expense. 

5.7.2  Approximate  NN  via  Metric  Tree 

It  is  well-known  that  the  worst  case  computational  complexity  of  metric  tree  is  linear. 
Thus  in  Table  7,  we  also  evaluate  the  performance  of  an  approximate  variant  of  metric 
tree  based  retrieval  in  which  we  limit  the  search  for  NNs  while  backtracking  the  metric 
tree  to  at  most  n  items,  where  in  our  experiments  we  used  n  =  5.  This  heuristic  is  in 
fact  a  variant  of  the  well-known  Best-Bin-First  (BBF)  [52]  method,  the  idea  being  to 
sacrifice  the  accuracy  a  little  bit  for  a  large  speedup  in  retrieval.  As  is  clear  from  the 
table,  such  a  simple  heuristic  can  provide  a  speedup  of  approximately  100  times  that 
of  the  exact  NN,  while  not  much  of  a  lose  in  the  accuracy.  Also,  it  is  clear  from  the 
table  that  JBLD  gives  the  best  accuracy  among  other  metrics  with  reasonable  retrieval 
results. 


5.8  Summary  of  Results 

Here  we  summarize  our  findings  about  JBLD  and  the  other  metrics  with  regard  to  our 
experiments.  As  is  clear  from  the  above  tables  and  plots,  JBLD  was  seen  to  provide  the 
best  accuracy  compared  to  other  metrics,  with  accuracies  sometimes  even  superseding 
that  of  the  Riemannian  metric.  It  might  seem  from  Table  7  that  the  speed  of  retrieval 
of  JBLD  is  close  to  that  of  AIRM;  this  result  needs  to  be  seen  together  with  the  results 
in  Table  5  which  shows  that  building  a  metric  tree  for  AIRM  is  extremely  challenging, 
especially  when  the  data  is  large  dimensional.  KLDM  sometimes  matches  the  accuracy 
of  JBLD,  and  exhibits  higher  errors  at  other  times.  However,  it  always  runs  slower  than 
JBLD,  requiring  up  to  more  than  twice  as  much  computational  time.  LERM  seemed 
superior  in  retrieval  speed  due  to  the  capability  of  offline  computations,  while  was  seen 
to  have  lower  accuracy.  Finally,  FROB  was  found  to  perform  the  best  in  speed  as  would 
be  expected,  but  has  the  lowest  accuracy.  In  summary,  JBLD  is  seen  to  provide  the 
most  consistent  results  among  all  the  experiments,  with  the  best  accuracy,  scalability 
and  moderate  retrieval  speeds. 

6  Conclusion 

We  introduced  a  similarity  measure  based  on  the  Jensen-Bregman  LogDet  Divergence 
(JBLD)  defined  over  the  set  of  positive-definite  (covariance)  matrices.  The  measure  has 
several  desirable  theoretical  properties  including  inequalities  relating  it  to  other  metrics 
for  covariances.  More  importantly,  it  was  shown  to  outperform  the  Riemannian  metric 
in  speed,  without  any  drop  in  accuracy.  Further,  we  showed  results  for  computing  the 
centroid  of  covariance  matrices  under  our  metric,  followed  by  an  application  to  nearest 
neighbor  retrieval  using  a  metric  tree.  Experiments  validated  the  effectiveness  of  the 
measure.  Going  forward,  we  would  like  to  investigate  the  applicability  of  JBLD  in 
classification  and  regression  settings. 
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Dataset 

AIRM 

JBLD 

LERM 

KLDM 

FROB 

Texture 
Acc.  (%) 
Time  (ms) 

83.00 

953.4 

83.00 

522.3 

78.40 

396.3 

83.00 

1199.6 

52.00 

522.0 

Activity 
Acc.  (%) 
Time  (ms) 

98.8 

3634.0 

99.00 

3273.8 

95.80 

1631.9 

98.60 

4266.6 

85.60 

1614.92 

Faces 
Acc.  (%) 
Time  (ms) 

26.6 

9756.1 

26.6 

1585.1 

22.8 

680.8 

26.1 

2617.7 

20.6 

658.6 

People 
Acc.  (%) 
Time  (ms) 

354.3 

100 

229.7 

92.0 

214.2 

98.1 

701.1 

43.3 

163.7 

Table  6:  True  NN  using  the  metric  tree.  The  results  are  averaged  over  500  queries.  Also  refer 
to  Table  5  for  comparing  the  metric  tree  creation  time. 


Dataset 

AIRM 

JBLD 

LERM 

KLDM 

FROB 

Texture 
Acc.  (%) 
Time  (ms) 

80.2 

34.28 

81.40 

21.04 

76.80 

18.18 

81.40 

52.98 

48.80 

17.73 

Activity 
Acc.  (%) 
Time  (ms) 

95.6 

38.1 

96.20 

30.39 

93.60 

20.3 

95.6 

85.9 

78.00 

12.2 

Faces 
Acc.  (%) 
Time  (ms) 

22.4 

26.16 

24.2 

23.2 

20.2 

20.6 

22.2 

55.7 

18.6 

16.6 

People 
Acc.(%) 
Time  (ms) 

4.81 

91.3 

4.78 

85.6 

3.31 

91.1 

8.12 

36.4 

3.07 

Table  7:  ANN  performance  using  Best-Bin-First  strategy  using  metric  tree.  The  results  are 
averaged  over  500  queries.  Also  refer  to  Table  5  for  comparing  the  metric  tree  creation  time. 
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